-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add implementation of MultipartFormData
streaming encoder
#200
Add implementation of MultipartFormData
streaming encoder
#200
Conversation
591475c
to
1c57fa0
Compare
const response = await fetch("https://http-me.glitch.me/meow?header=cat:é");
strictEqual(response.headers.get('cat'), "é"); |
1c57fa0
to
655b57d
Compare
@andreiltd, is this ready to review, or are you still expecting to make changes to it? |
Yes, this is ready to review :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either I'm missing something in looking at the diff, or there is some code missing—see the inline comment on form-data-encoder.h
.
One thing that I can't fully evaluate, because it'll presumably be part of the implementation of encode_stream
: IIUC, field names, will always be first have their newlines normalized to CRLF
, and then have those escaped. It'd be good to fold those into one operation, if possible.
Otherwise, I left a few comments and suggestions. I'm a bit concerned about allocation and general failure handling, so it'd be good to go over those aspects in some detail.
bool MultipartFormDataImpl::handle_entry_header(JSContext *cx, StreamContext &stream) { | ||
auto entry = stream.entries->begin()[chunk_idx_]; | ||
auto header = fmt::memory_buffer(); | ||
auto name = escape_newlines(entry.name).value(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs a null check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function is overloaded but the version that doesn't take JSContetext
is infallible. I can change this to separate function like escape_name
and escape_name_val
if we prefer explicit return value. Anyway, I've added the assertion for now.
Hi @tschneidereit, as always, thank you so much for a through review!
So the spec says this:
It's really hard to follow the spec closely because the implementation is state-machine-based, and a lot of actions that the spec calls for correspond to different states in the implementation and cannot be done together. For example, processing both names and values: names are part of the entry-header state, whereas values are handled in the entry-body state. |
I've adopted a bunch of Unfortunately, I wasn't able to run those in
const formDataText = await (await fetch(
`/fetch/api/resources/echo-content.py`,
{
method: "POST",
body: formData,
},
)).text(); This returns an empty string. |
That's a great idea!
Can you say more about how you're trying to run the test? I can't see where in the PR the Taking a step back though, to fully adapt a test from WPT requires at least these steps:
|
Oh, I was just trying to run The failures that I described were from running
That's a good point. Would |
It would be, yes. Though if you're only doing this to work around weird issues with running WPT tests, then I would overall prefer figuring out what causes them. Can you give me steps to reproduce for the issue you were seeing with this test? |
I added the test like this: diff --git a/tests/wpt-harness/tests.json b/tests/wpt-harness/tests.json
index 349baa1..fb757b9 100644
--- a/tests/wpt-harness/tests.json
+++ b/tests/wpt-harness/tests.json
@@ -175,6 +175,7 @@
"FileAPI/blob/Blob-stream.any.js",
"FileAPI/blob/Blob-text.any.js",
"FileAPI/file/File-constructor.any.js",
+ "FileAPI/file/send-file-formdata.any.js",
"hr-time/basic.any.js",
"hr-time/idlharness.any.js",
"hr-time/monotonic-clock.any.js", and then running just wpt-test FileAPI/file/send-file-formdata.any.js gives me this error:
|
Just looking at how we import meta scripts here it looks like the meta scripts are evaluated separately from the test, shouldn't the meta and the test be concatenated instead? |
I don't entirely remember why I landed on this exact way of loading META scripts. I do remember that it was non-trivial to get things working, and included this weird thing where we replace |
Yeah, it would be really nice to have those tests working, it's worth dozens of wpt passes. Let me know if you plan to investigate otherwise I will allocate some time to it for next week. 🙂 |
Merging #209 Should fix failing wpt tests in this PR. |
This is looking quite excellent! 🥳 I think it's very close to being ready to land—however, before doing another in-depth review, it'd be great if you could go through the code and add a lot more links to the spec algorithms implemented. Additionally, adding comments explaining non-obvious design choices would be good, too. In particular, explaining the boundary string code with a comment covering what you explained in the initial description in this PR would be good. (It's very much okay to have these comments be long: we'll be grateful to have detailed explanations of all this stuff in the future!) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fantastic work!
I left a last round of feedback, but none of that should require another look after you've addressed it, so I'm happy to merge once you let me know it's ready!
|
||
auto boundary = MultipartFormData::boundary(encoder); | ||
auto type = "multipart/form-data; boundary=" + boundary; | ||
host_type_str = type.c_str(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can just do host_type_str = type
here, as HostString
has a constructor taking a string_view
. That at least saves one copy: .c_str
does an internal copy to append the trailing \0
. It's a bit unfortunate that we have to do another copy here, but this is all expensive enough that it probably doesn't matter at all. (Plus, LLVM might actually eliminate it.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to cleanup this code a bit. I think the problem here is that at the top of this function the content_type
is defined as const char *content_type
and then when we pass it to the Header::set_valid_if_undefined
it is implicitly converted into a string_view
. This conversion is UB if there is no trailing \0
.
I changed this to use a string_view
everywhere instead. This also eliminates copying.
Oh, and I forgot to mention: the comments you added are really excellent and made reviewing this much easier. Thank you! |
Hey @tschneidereit , thanks for review! I I've addressed all the issues from the last round except for the constructor suggestion. See my comment: #200 (comment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🥳
I did a quick comparison of encoded
FormData
between SM and Chrome using following JS code:Simple FormData
The results are:
I also did some manual fuzzing to test the streaming logic by varying the buffer sizes the encoder writes into. Specifically, I tested these buffer sizes in bytes: 1, 2, 4, 8, 32, 1024, and 8192 by changing the size in the implementation. Though, having some unit test framework would be nice.
Relevant RFC: https://www.rfc-editor.org/rfc/rfc2046#section-5.1.1